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Fig. 1. Reber's radio survey. We can see the Milky way Galaxy, Cygnus A 
and Cassiopeia A. Image courtesy of NRAO/AUI 



Abstract — Next generation radio telescopes will be much larger, 
more sensitive, have much larger observation bandwidth and will 
be capable of pointing multiple beams simultaneously. Obtaining 
the sensitivity, resolution and dynamic range supported by the 
receivers requires the development of new signal processing 
techniques for array and atmospheric calibration as well as new 
imaging techniques that are both more accurate and computa- 
tionally efficient since data volumes will be much larger. This 
paper provides a tutorial overview of existing image formation 
techniques and outlines some of the future directions needed 
for information extraction from future radio telescopes. We 
describe the imaging process from measurement equation until 
deconvolution, both as a Fourier inversion problem and as an 
array processing estimation problem. The latter formulation 
enables the development of more advanced techniques based on 
state of the art array processing. We demonstrate the techniques 
on simulated and measured radio telescope data. 



I. Introduction 

The field of radio astronomy is a relatively young field of 
observational astronomy, dating back to pioneering research by 
Jansky in the 1930s Q who demonstrated that radio waves 
are emitted from the Milky Way galaxy. Inspired by his work, 
Reber [ 2 ] made the first radio survey of the sky using a radio 
telescope that he built in his backyard. Figure [T] depicts some 
results of his radio survey, including the strong radio emissions 
of Cygnus A (Cyg A) and Cassiopeia A (Cas A). In 1946 Ryle 
and Vonberg [3 ] used the Michelson interferometer to observe 
radio emissions from the sun at a frequency of 175 MHz. 
Ryle continued to construct interferometers located on rails, 
which allowed him to create a synthetic aperture by moving 
the antennas. This is the origin of modern inverse synthetic 
aperture radar (ISAR) and the active synthetic aperture radar 
(SAR) imaging. Subsequently, the study of radio emissions 
from celestial sources has led to many great discoveries such 
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Fig. 2. The Allen Telescope Array. Image courtesy of Seth Shostak and the 
SETI institute. 



as cosmic microwave background radiation by Penzias and 
Wilson [4 ] and its anisotropy 1 5 ] and pulsars, which are rapidly 
rotating neutron stars, by Bell et al. |6|. Other phenomena 
of great interest for radio astronomers include gravitational 
lenses where the gravitational field of a massive object serves 
as a lens by bending the light wave (many of the gravitational 
lenses were discovered in radio frequencies [}, active galactic 
nuclei such as in Virgo A (also known as M87^] , and 
supernova remnants such as Cassiopeia A. Radio astronomy 
also deals with spectral lines that appear at radio frequencies 
such as the Hydrogen spectral line which was first detected in 
1951 This spectral line is expected to play an important 
role in understanding the reionization of the universe when 
the first galaxies were formed. In 1962 the principle of 
synthesis aperture imaging using earth rotation was proposed 
by Ryle . Ryle's idea was simple and beautiful. Instead 
of moving the antennas as he has been doing for about 15 
years, he used the fact that the earth rotates to generate the 
synthetic aperture. This quickly became the main operating 
mode of radio interferometers. However, imaging using earth 
rotation synthesis radio telescopes is an ill-posed problem due 
to the irregular sub-Nyquist sampling of the Fourier domain. 
This sub- sampling results in aliasing inside the image due 
to the high sidelobes of the array response. To solve this 
problem we need to remove the effect of the instrumental 
response from the image (a process known as deconvolution) 
and to compensate for inaccuracies in the array response 
(known as self calibration, but it has many similarities to 

1 see [r^tp://www.aoc.mao.edu/^smyers/class.html] 

2 Virgo A is a giant galaxy in the Virgo cluster which has jets of particles 
moving at relativistic speeds and emitting very strong radio waves. It is 
believed that the center of the Virgo A galaxy is a very massive black hole 

3 The spectral line at 21 cm is created by a change in the energy state of 
neutral hydrogen 
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Fig. 3. Cygnus A image. False color image of the radio jet and lobes in the 
hyperluminous radio galaxy Cygnus A. Red shows regions with the brightest 
radio emission, while blue shows regions of fainter emission. Image courtesy 
of NRAO/AUI by Perley et al. 0. 



blind deconvolution). It is important to understand that the 
improved imaging capability is a result of better equipment 
in conjunction with new imaging techniques. Each generation 
of radio telescopes involved significant hardware development 
effort. However, exploiting the hardware capabilities requires a 
constant improvement in imaging and self calibration to match 
the receiver sensitivity. Figure [3] (By Perley et al. (8))presents 
the outcome of imaging and self calibration applied to an 
image of Cygnus A. It is the first discovery of the radio 
jets going from the center all the way to the external radio 
lobes. Even though Cygnus A has been observed for many 
years (since Reber's time) it is the image formation and self 
calibration algorithms that allowed the discovery of the radio 
jets. 

Over the last 40 years many deconvolution techniques have 
been developed to solve this problem. The basic idea behind 
a deconvolution algorithm is to exploit a-priori knowledge 
about the image. The first algorithm and the most popular of 
these techniques is the CLEAN method proposed by Hogbom 
ifTOl . The maximum entropy algorithm (MEM) with various 
entropy functions was proposed in ATI . fl2l . 031 and ifTH 
and the current implementation by Cornwell and Evans fT5l 
is the most widely used. Beyond these two techniques there 
are several extensions in various directions: extensions of the 
CLEAN algorithm to support multi-resolution and wavelets 
as well as non co-planar arrays and multiple wavelengths 
(see the overview paper lfT6l ). MEM techniques have been 
also extended to take into account source structure through 
the use of multiresolution and wavelet based techniques ifTTl . 
Global non-negative least squares was proposed by Briggs 
(El, matrix based parametric imaging such as the Least 
Squares Minimum Variance Imaging (LS-MVI) and maximum 
likelihood based techniques in fT9l and l20l and sparse 
Li reconstruction in lf2TTl and (22). Source modeling is an 
important issue and various techniques to improve modeling 
over simple point source models by using shapelets, wavelets 
and Gaussians |23 ] have been implemented. A more extensive 
overview of classical techniques and implementation issues is 
given in lf24l or l25l 

Better performance analysis of imaging as well as self 
calibration techniques is one of the major challenges for 
the signal processing community. This is likely to become 



a more critical problem for the future generation of radio 
interferometers that will be built in the next two decades such 
as the square kilometer arra)^] (SKA), the Low Frequency 
Arra>0(LOFAR), the Allen Telescope Arra>0(ATA, see figure 
[2]), the Long Wavelength Arra>[[] (LWA) and the Atacama 
Large Millimeter Arra)jj(ALMA). These radio-telescopes will 
be composed of many stations (each station will be made 
up of multiple antennas that are combined using adaptive 
beamforming). These radio-telescopes will have significantly 
increased sensitivity and bandwidth, and some of them will 
operate at much lower frequencies than existing radio tele- 
scopes. Improved sensitivity will therefore require a much 
better calibration, the capability to perform imaging with much 
higher dynamic range in order to reduce the effect of the 
residuals of powerful foreground sources inside and outside 
the field of view and better handling of non-coplanar arrays. 

The structure of the paper is as follows: We begin with 
a description of the basic imaging equation and a paramet- 
ric reformulation of the problem. The basic approaches to 
imaging, assuming a calibrated array are described next. We 
then describe some modifications of the parametric imaging 
techniques that make it possible to combine imaging and 
calibration through semi-definite programming. We end up 
with some simulated and measured examples. 

II. The imaging equations 

This section reviews the basic principles of radio astronomy 
following Taylor et al. |25). In radio astronomy we observe 
the radio waves emitted from space. Since the source is far 
away, the received electromagnetic field intensity distribution 
can be observed only in an angular direction (no information 
regarding the intensity distribution in the radial direction). 
Defining the celestial sphere as the maximal sphere that 
contains no radiating sources, the observed intensity is the 
projection of the source intensity on the celestial sphere. For 
simplicity we will deal with a quasi monochromatic wave at 
frequency v (the general case can be easily derived by a linear 
combination of quasi monochromatic waves). The electric field 
at location r is given by: 



E„(r) = J e„(q) 



3 27rjzy|q-r|/c 



-dS 



(1) 



where e u (q) is the electric field at location q (on the celestial 
sphere), dS is surface area on the sphere and the integration is 
done over the entire sphere and c is the speed of light. For two 
antennas observing a distant source (receiving the electric field 
emitted by the source) there is a geometrical delay in one of the 
antennas relative to the other antenna derived from the source 



observation angle (see Figure |4(a)| ); if the geometric delay is 
compensated by an electronic delay, the electric field received 
in one antenna should be highly correlated with the electric 
field received by the other antenna. The spatial coherency of 



4 http://www.skatelescope.org/ 
5 http://www.lofar.org/p/astronomy.htm 

6 http : //r al .berkeley . edu/ ata/ 

7 http : //lwa. unm . edu/ 

8 http://www.almaobservatory.org/index.php 
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Fig. 4. Measurement setting. |4(a)| - The visibility is the measurement of 
spatial correlation between the electric field of antenna pairs. The geometric 
delay of the wave that propagates fro m the source to the two antennas is 
compensated for by an electronic delay. |4(b)] - A distant source is observed by 
an antenna pair. The baseline connecting the two antennas is the origin of the 
(u, v, w) coordinate system. The w axis points from the baseline toward the 
source reference point, (u, v ) are perpendicular to w and selected according 
to the Earth's orientation. (l,m,n) is a unit vector in the (u,v,w) system 
pointing toward a specific locati on in the sou rce (at the source reference point 
So, I = 0, m = ) and n = y^Z 2 +m 2 ). 



Fig. 5. (u, v) coverage of a simulated East- West radio telescope with 14 
antennas logarithmically spaced with baselines up to 200A. Observations are 
made every 6 minuets for a duration of 12 hours. From each antenna pair we 
get an ellipse in the (u, v) domain, u and v are in A units. 



the electric field for two antennas located at ri and r 2 is given 
by 

V r I/ (r 1 ,r 2 ) = (E I/ (r 1 )E*(r 2 )>. (2) 

where () stands for the expectation value. Substituting |T]) into 
^ and taking into account the large distance of the source; i.e. 
|q — r| w |q| and that the electric field is spatially incoherent 
(i.e., Mqi)e*(q 2 )) = Vqi ^ q 2 ) we get 



K(ri,r 2 ) = J I u (s)e- 2 ^< r ^^ c dn, 



(3) 



where I u (s) = (e u (s) 2 } is the source intensity at direction s 
on the sphere (s = y^jj), and dQ = y^p. Representing (|3j) in 
the (u,v,w) coordinate system, for many astronomical ooser- 
vations (e.g., planar arrays, or small field of view imaging) we 
obtain 



V„(u,v) 



I u (l,m)e- 27rj ( ul + vm) dldm. 



(4) 



The visibility is the Fourier transform of the source intensity; 
therefore the inverse relation holds : 



I v (l,m) 



V u (u,v)e 27rj ( ul + vm) dudv. 



(5) 



When the co-planar approximation does not hold, equation 
^ takes the more complicated form 



^dldr 



where 



Vi-i 2 



(6) 



(7) 



For a source with visibility measurements covering the 
entire (u,v) domain, the source image is perfectly computed 
by the Fourier inversion of the visibility. In practice, only 
a small part of the (u,v) domain is measured by sampling 
the existing antenna pair baselines as they change with the 
Earth's rotation relative to the (u,v) coordinates (at time tk 
two antennas p and q measure a single visibility point in the 
(u,v) domain at (Up q ,Vp q )) . This set of samples is known 



as the (u, v) coverage of the radio telescope. This coverage is 
determined by many factors such as the configuration in which 
the individual receptors (telescopes or dipole) are placed on the 
ground, the minimal and maximal distance between antenna 
pairs, the time difference between consecutive measurements 
and the total measurement time and bandwidth. An example 
of the (u,v) coverage for a simulated radio telescope (East 
west array with 14 antennas logarithmically spaced from A to 
200 A, observation time of 12 hours) is shown in Figure [5] The 
sampled points in the (u, v) plane are a collection of ellipses. 
The sampling effect on the resulting image is shown in Figure 
6(a)| and |6(c)| Figure |6(a)| depicts an image of visibility data 
measured over a dense and uniform grid in the (u,v) plane 
(all grid points in the (u,v) plane were sampled). Figure 6(c) 
presents the same data with a more realistic (iz, v) sampling. 
The image with the partial (and more realistic) measurement 
set is blurred, distorted and noisy. Let S(u,v) be the sampling 
function (S(u,v) = 1 for each measured (u,v) pair and 
S(u,v) = otherwise). We obtain that the inverse direct 
Fourier transform of the measured visibility, known as the dirty 
image Id is given by: 



Id, 



,(/,m) = J J V u (u,v)S(u,v)e 2 ^ ul+vm ^dudv. (8) 

The instrument point spread function which is also known as 
the dirty beam is defined by: 



B(Z,m) = J J S(u,v)e 2 ^ ul+vrn) dudv. 



(9) 



By the convolution theorem, the dirty image is the convolution 
of the true source intensity §5§ and the dirty beam ([9]): 



L D,v 



(10) 



This is the reason why image reconstruction algorithms in 
radio astronomy are often referred to as deconvolution al- 
gorithms, since direct synthesis produces Id,v, but we want 
to obtain I v by deconvolution with respect to B. The dirty 
image can be calculated from the measured visibility data 
according to equation d8l), or by using a Fast Fourier Transform 
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measurements were located on the center of a grid cell). In 



(a) 




(b) 




(c) 

Fig. 6. Illustration of sampling and gridding effects. [6(a)] Fourier transform 
of visibility data on a perfect grid (visibility measurements l ocation are on 
the center of the grid cells) with complete (u, v) coverage. |6(b)| Fourier 
transform of visibility data with off-grid points (the visibility measurements 
location were chosen randomly) and complete (u, v) coverage, demo nstrates 
gridding effect. Features in the image are blurred and distorted. |6(c)| Fourier 
transform of visibility data with perfect grid and incomplete (u,v) coverage 
(of the radio telescope described in Figure [5j, demonstrates the sampling 
effect. 



(FFT) to reduce the calculation time and memory resources. 
In order to use the FFT, the visibility data must lie on a 
rectangular equally spaced grid. This procedure of re-sampling 
the measured visibilities on a regular grid is called gridding. 
The weighting is done by convolving the visibilities with a 
smooth kernel (this procedure is also called convolutional 
gridding). Choice of the gridding kernel is important and 
follows from standard interpolation theory. An illustration of 
the gridding effect for a rectangular kernel is shown in Figures 
|6(a)| and |6(b)| Both images were generated using simulated 
visibility data with complete (u, v) coverage. In Figure 6(a) 
the visibility data were taken on a perfect grid (all visibility 



Figure |6(b)| the location of the visibility measurements was 
chosen randomly within the cells in the (u,v) plane). This 
results in a blurred and distorted image. For more details on 
gridding and tapering the reader is referred to [24] and (25) . 

III. The parametric matrix formulation of the 

IMAGE FORMATION PROBLEM 

We now describe an alternative formulation of the image 
formation problem. In this formulation imaging is viewed 
as a parameter estimation problem, where the locations and 
powers (and possibly polarization parameters and frequency 
dependence of power) are the unknown parameters. This 
formalism was first proposed in (26) and (T9) to allow for 
the introduction of interference mitigation techniques in the 
imaging process. It was extended to non co-planar array and 
polarimetric imaging in (20). This formulation also allows 
easy inclusion of space dependent calibration parameters (27). 
Assume that the observed image is a collection of D point 
sources, i.e., 

D 

J„(Z,ra) = y^I u (l,m)5(l - l d )6(m - m d ). (11) 



d=l 



Since (u,v) are the baseline coordinates (i.e. u 



u? - 



Uj and v = = v\ 



as 



•Vj ), the visibility ^ can be rewritten 



D 



d=l 

where k denotes the measurement time Selecting a (time 
varying) reference point at one of the antennas (uq,Vq) and 
manipulating ( [T2| ) yields 

VMj^ij) = Etie- 2 ^ ( <° /d+ ^° md) /,(^,m d ) 

e 2Tvj{u k j0 l d +v k jQ m d ) 

(13) 

We define the /c'th measurement correlation matrix R/c by: 



(Rfc)^- =K(4,4) 



(14) 



The correlation matrix is illustrated in Figure [7J for a single 
frequency bin. Cell of the correlation matrix is the 
visibility measurement at time from antenna pair The 
size of the correlation matrix R& is pxp where p is the number 
of antennas in the array. The autocorrelation of each antenna 
is also used (the diagonal of the correlation matrix). When 
an observation uses more than a single frequency bin, each 
correlation matrix is computed using a single bin as illustrated 
in Figure [8] 

Let the Fourier component vector at time be: 



/ -27rj(ul l+vl m) \ 



a fe (/,m) 



V 



-27rX< 2+«£ m) 



J 



and let the Fourier component matrix at time tk be 



afc(Zi,rai), 



,afc(Zd,rad) 



(15) 



(16) 
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R k - Correlation Matrix at t u 




Fig. 7. The correlation matrix (for the simple case of a single frequency) 
in a specific time measurement tk is built from the visibility measurement of 
antenna pairs 



A./ 




Fig. 8. The correlation matrices for an observation in the multi frequency 
case. The correlation matrices are calculated for each frequency separately 
(on filtered measurements from the antennas). 



Define the point source intensity matrix by : 

/(li,mi) 



B 







I(l D ,m D ) 



(17) 



Using ( 15 )-( 17 ) equation ( 13 ) can be rewritten as 



AfeBA 



H 



(18) 



Matrix equation (18) is the parametric form of the classical 
equation ([4]). It will allow us to consider the problem as 
an estimation problem, where we observe a set of measured 
covariance matrices which depend smoothly on the unknown 
source and instrument calibration parameters, as well as re- 
ceiver noise. Using this formulation we can easily use well- 
known techniques from estimation theory (such as maximum 
a-posteriori, ML, MVDR and robust techniques) to solve the 
image formation problem. It also enables a simple extension 
to the non-coplanar array case as well as polarimetric imaging 
and multi-frequency synthesis, where sources have frequency 
dependent (parametrically known) characteristics. The classi- 



cal dirty image ([8} can be rewritten as 

J D (Z, m) = ^ J2 ™) R fc a fc(^ m )- ( 19 > 

k 

Note that this is identical to the mean power output of a 
classical beamformer oriented towards direction (Z,m). More 
realistically, the antenna response varies slightly between 
different antennas and there is an additional noise per antenna. 
The antenna response can be measured prior to the observation 
and taken into account in the model. Since the noise in 
two antennas is independent, the noise correlation matrix is 
diagonal. Denoting by 7^ the unknown complex gain of 
antenna i at observation time tk and by a 2 the noise variance, 
the correlation matrix now becomes: 



where 



r fe A fc BAfrf 



7i,fe 



a 2 I 



(20) 







Estimation of the 7^ is discussed in a companion paper by 
Wijnholds et al. (28). Note that typically 7^ varies slowly so 
it can be assumed to be constant over multiple times. Similarly, 
the non-coplanar array case is given by replacing a& and B 
by 



a fe (/,m) 



e -27rj(uf j0 Z+vf j0 m+it;J B >0 n) 



e -27rj(n^ Z+^ )0 m+ W { ; ;0 n) 



-1 T 



and 



I(h, mi) 
yj\-i\-m 







(21) 



(22) 



The radio imaging problem can now be reformulated as fol- 
lows: Given a set of measured covariance matrices Ri, 
estimate the parameters si,...,S£>, ),•••, I( s d) an d the 
calibration matrices : k = 1,...,K. Note that ( [20| ) 
can be easily generalized to deal with direction dependent 
calibration parameters, polarized sources as well as multi- 
frequency synthesis. All that we need to change is the source 
and the calibration parametric model by simple adaptation 
of ( [20] ). The parametric approaches described in this paper 
can be applied uniformly to all these problems. However, for 
simplicity we will focus on the calibrated array case. 

IV. Classical and parametric approaches based on 

SEQUENTIAL SOURCE REMOVAL 

Many algorithms in radio astronomy are based on sequential 
source removal. The most commonly used is the CLEAN 
algorithm originally proposed by Hogbom [10]. These iterative 
algorithms assume that the observed field is a collection 
of sources with simple structure. CLEAN assumes that the 
sources are point sources. During each iteration a single 
point source is estimated and removed from the data. The 
reconstructed image is the collection of all point sources with 
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Initialization: 

• Calculate the dirty image Id according to measured visibilities. 

• Calculate the reconstruction beam B rec for later use. 
While stopping criteria not met: 

• Find the brightest location in the dirty image (li,mi). 
This is the location of a new point source. 

• Estimate the new point source intensity A^. 

• Add the new point source to the source list, 
(with the estimated intensity). 

• Remove the new source response from the data 

(both the dirty image and the visibility measurements). 
Finalize: 

• Calculate the reconstructed image ±rec 

by convolving the source list with the reconstruction beam. 

TABLE I 

Generic source removal algorithm flow 



Initialization: 

• Calculate Id (eq. |8}. 

• i = 0. 

• £>rec = Gaussian. 
While Id is not noise-like: 

• (li,rrii) = arg max 7d(Z, ra). 

• Xi = I D (h,rni). 

• For all p, q, k: 

• Update I D (eq. [8j. 

• i = i -\- 1. 
Finalize: 

• Tree = Ip + ^jjXjBrecjl — lj,m — TJlj). 

TABLE II 

Visibility domain CLEAN algorithm flow 



their estimated power convolved with an ideal reconstruction 
beam (usually an elliptical Gaussian fitted to the central lobe 
of the dirty beam) . The general structure common to all the 
sequential source removal algorithms is described in Table [I] 
The algorithms differ from each other by the exact definition of 
the dirty image used, the way the point source is removed from 
the image (either in the image domain after gridding or in the 
visibility domain), the intensity estimation method of the point 
sources and the exact modeling of the sources (point source, 
Gaussian, wavelet coefficients, shapelets, etc.). Some versions 
like the Cotton-Schwab technique estimate multiple sources 
based on the same dirty image. This significantly accelerates 
the algorithm, since the number of Fourier transforms of the 
image is reduced. 

We describe two sequential source removal algorithms. The 
first is the CLEAN algorithm and the second is a parametric 
estimation based algorithm known as LS-MVI. 

A. The CLEAN Algorithm 

The CLEAN algorithm assumes that the observed field 
of view is composed of point sources. Since the image of 
a point source is given by the convolution of the point 
source and the dirty beam ( [TO} , CLEAN iteratively removes 
the brightest point source from the image until the residual 
image is noise-like. There are several variants of CLEAN 
((MED,(I1,EDX The CLEAN algorithm is implemented 
either in the image or in the visibility domain. In each iteration 
the brightest point in the dirty image (equation ([5]) ) is found 
(position and strength) and added to a point source list. A 
fraction of it (7, < 7 < 1) is removed from the dirty 
image. The 7 parameter is called the loop gain and is usually 
taken to be 0.1-0.2. The iterations continue until the residual 
image is noise like. The subtraction can be done either in 
the image domain or in the visibility domain. The visibility 
domain CLEAN is more accurate since we are not limited to 
pixel resolution. The algorithm flow for ungridded visibility 
domain CLEAN is summarized in Table [III 

An illustration of the CLEAN algorithm on a simulated 
image is shown in Figure d9]). The simulated radio telescope 
is the same as in Figure §5j. The loop gain used is 0.2. In 
every iteration the strongest point source is found, added to 
the reconstructed image and subtracted from the dirty image. 



The loop gain serves three purposes. First, it prevents (or 
at least reduces) the effects of over-estimation of the power 
due to sidelobes from other sources. Second, it allows for 
interpolation of sources that are located off the grid. Third, 
it improves performance with extended sources. However, 
this limits the dynamic range of the image. The effect of 
pixelization and choice of grid on the dynamic range of the 
imaging process is further discussed in (32), (33). Improved 
versions of CLEAN allow for estimation of location off the 
grid by using interpolation, and subtraction of the effect from 
the visibility rather than the dirty image. This has the positive 
effect of eliminating gridding accuracy effects. Acceleration of 
the CLEAN algorithm can be achieved by estimating multiple 
point sources based on a single dirty image (major cycle), as 
well as defining windows for the search procedure. Practically, 
defining windows reduces the size of the search space. 

1 ) Clark CLEAN Algorithm: One of the important variants 
of CLEAN was proposed by Clark in 1980 (30). Clark's 
algorithm main advantage is reduction of computational load. 
The algorithm is performed in two cycles, a major cycle and 
a minor cycle. A major cycle is constructed by selecting 
intensity limit value (according to a histogram of the dirty 
image values) and approximating a dirty beam (central patch 
of the true dirty beam) to be used during the subsequent 
minor cycles. A minor cycle consists of finding the brightest 
pixel in the image (i.e. a new point source) and removing a 
fraction of the point source response from the dirty image. 
In principle, the minor cycle is the same as described in the 
'While' loop in Table [TTJ when the dirty beam used is only 
the centeral patch of the full dirty beam (hence computational 
complexity is significantly reduced). The inaccuracies caused 
by working with an approximated dirty beam are corrected 
during the major cycle. The Clark algorithm is performed in 
the visibility domain instead of the image domain, yielding 
a multiplication instead of a convolution for calculating the 
point source response. 

2) Cotton Schwab Algorithm: Cotton & Schwab (3D devel- 
oped a variant of the Clark CLEAN. Like the Clark CLEAN, 
in the Cotton Schwab CLEAN the procedure involves major 
and minor cycles. The main improvements over the Clark 
algorithm are that the Cotton Schwab algorithm calculation is 
done over the ungridded visibility data, thus avoiding gridding 
errors, and multi source removal is done independently in each 
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(b) Initial Dirty Image 




(c) Reconstructed Image - 50 iter 



(d) Residual Image - 50 iter 





(e) Reconstructed Image - 25000 il 



(f) Residual Image - 25000 il 



Fig. 9. CLEAN steps for a simulated image, (a) the original image, (b) the 
initial dirty image, (c) and (e) the reconstructed image after 50 and 25000 
CLEAN iterations respectively. (d),(f) the residual dirty images after 50 and 
25000 iterations respectively. In general, the sources are nicely reconstructed 
except from the square ring extended source. All images share the same color 
map. All images were up-sampled by 4 using Matlab basic interpolation. 



are combined taking the constant w value into account. 

In the general case (projection of any w values) the relation 
between V(u, v, w) and V(u, v, w = 0) is given by 



V(u, v, w) — G(u, v, w) * V(u, v, w = 0) 



where 



G(Z, m, w) 
G(u, v, w) 



e -27rj[w(Vl-l 2 -m 2 -l) 
e 7rj[w(l 2 +m 2 )] 

W 



(23) 



(24) 



and G(l,m,w) is the Fourier transform of G(v,u,w) called 
the W-projection function. Given a model of the sky bright- 
ness, the visibility on the w = plane can be calculated 
using the two dimensional Fourier transform. The visibility 
measurement outside the w = plane may then be calcu- 
lated using the convolution function G(u,v,w). Note that 
representing the visibility as a convolution and using the 
FFT algorithm to compute the convolution is similar to the 
one-dimensional chirp z-transform algorithm. Calculating the 
image for a given set of visibility measurements is done using 
iterative algorithms since there is no inverse transform. The W- 
projection is a minor-major cycle algorithm that receives three 
dimensional visibility measurements V(u,v,w) and projects 
the w coordinate 'out' (projection on w = plane). The 
2D visibilities V(u, v,w = 0) are used to calculate the 
reconstructed image in the (Z, m) domain by a two dimensional 
Fourier Transform. Then a deconvolution is performed (such as 
CLEAN) on the resulting image. The W-projection algorithm 
has both high performance and high computational speed. 



minor cycle (from different fields). The CLEAN components 
from all fields are removed in the major cycle. Working with 
the ungridded visibility measurement is done using a measure- 
ment list as described in Table An element V(u^ q , v^ q ) of 
the measurement list is the measured visibility by an antenna 
pair (p, q), corresponding to a baseline {u PA ,v PA ) measured 
at time k. 



B. The W-Projection algorithm 

One of the main limitation of the previous technique is 
the case of non-coplanar arrays and large field of view. To 
overcome problems related to non-coplanar arrays the W- 
projection algorithm has been proposed by Corn well et al. 

El. 

The W-projection algorithm deals with non-coplanar ar- 
rays i.e. when the planar approximation is violated and the 
imaging equation is given by equation ([6]). Originally Frater 
and Docherty [ 35 1 showed that a projection of visibility 
measurements from a constant w plane to w = plane can 
be done. This corresponds to a radio telescope with antennas 
arranged in a plane with a single antenna outside the plane. 
In this case the measured visibilities are projected onto w = 
plane (real and imaginary part separately), a deconvolution is 
performed (such as CLEAN) and the resulting cleaned images 



C. The LS-MVI algorithm 

We now describe a recent approach that enables the use 
of modern array processing algorithms in the framework 
of image deconvolution. The method will be demonstrated 
on simulated and measured data. However, in contrast to 
the CLEAN algorithm it is in initial research stages and 
further development of the technique is an interesting research 
problem. The LS-MVI algorithm is a novel matrix based 
sequential source removal algorithm originally proposed in 
fT9l and further improved in [20]. It is based on matrix based 
approach to direction-of-arrival (DOA) estimation techniques. 



We would like to replace the vectors afc(s) in ( 19) by a set of 
beamforming vectors Wk(s),k = 1,...,K. The main goal of 
the LS-MVI is to eliminate interference from other points in 
the image when estimating the location and power of a given 
source. To that end, filterbank techniques such as the MVDR 
and its extensions have proven very effective. Minimizing the 
interference from sidelobes of the dirty beam while observing 
a point source in direction s = (/, m) can be formulated as a 
constrained beamforming problem (For simplicity we denote 
Wfc(s) by Wfc and assume that w = [wi, w^] T ). 



w(s) = argmin w ^f =1 w^R fe w fc 
subject to 

Ef=i w fc a fc(s) = i. 



(25) 
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The solution is given by 

Wfc(s) 

where, /3(s) = 



Ef =1 a^(s)R^a fe (s) 



/^Rj^a^s), (26) 
, a/c(s) is given in equation 



( 15 ) and H k is the co variance matrix measured at time t k . The 



vectors w(s) have different magnitudes for different values of 
s. This is undesirable since it generates noise related spatial 
features. Therefore, the adapted angular response (AAR) so- 
lution normalizes the norm of w to 1. The resulting solution 
is given by 



l£ AR (l,m) 



(27) 



This modified dirty image replaces the classical dirty image 
in the LS-MVI deconvolution process. 

The intensity estimation used by the LS-MVI algorithm is 
a LS estimation of a point source at location m) and given 
by the following equation: 



a = argmin^^^, ||R fe ■ 
subject to a > 



tta fe (l,m)af(/,m) 



(28) 



This estimate of the source power has been independently used 
in ASP-CLEAN (36). The closed form solution of equation 



( 28 ) is given by 



(h H r 1 



(29) 



where 

h = [vec(ai(Z, m)af r (Z, m)) T , vec(a.K(l, ra)a|£(Z, m)) T ] T 

i T 

are obtained by stack- 



and 



vec(Ri) T , v6c(TIk) t 
ing the array response and the measured covariance matrices 
respectively. 

The intensity estimation can be improved by adding the 
semi-definite constraint 



R^ — cr 2 I — aajfe(Z, m)a^(Z, m) 0. 



(30) 



The intensity estimation is bounded between the solution ( 29 ) 
and 0. Hence, a better intensity estimation can be achieved 
using a simple bi-section. A summary of the LS-MVI algo- 



rithm is given in Table (III). Another improvement that has 
low computational complexity is to use a joint LS estimate 
of all previously estimated sources. Assuming that we have 
collected L components the estimator is given by: 



OL 



argmin a J2 k =i ll 3 ^ 
s.Lctj > for all i 



EL I I 9 

i=1 aiq ki \\ 



(31) 



where a [ai, a L ], r k vec(R k ) and q ki 
vec(aLk(li,rni)a.k(k,rni)). Similarly to the CLEAN algo- 
rithm this improvement can be implemented only at major 
cycles, after several sources have been estimated. 

There are two main differences between LS-MVI and 
CLEAN. First, the LS-MVI uses a different type of dirty 
image and second, the LS-MVI performs a more sophisticated 
intensity estimation than CLEAN. The dirty image used by the 
LS-MVI is J^ AR given in equation (127]). The main advantage 
of the AAR dirty image over simple MVDR is the isotropic 



Initialization: 

• R° =R fc , Vfc = 

• Calculate 7^ AR using eq. pTJ 

• 2 = 

• B rec = Gaussian 
While Id is not noise like: 

• (li,rrii) = argmax/^ AR (i,m) 

• Estimate oti according to eq.|29| 

• Optionally improve oti estimation according to eq. {30} 

,^Vk = l...K 

• Calculate I§ AR using R* fc + 

• 2 = 2 + 1 

Finalize: 



• In 



TABLE III 
LS-MVI ALGORITHM FLOW 



noise response that prevents the formation of spatially varying 
noise related artifacts. In [20 ] further extensions for enforcing 
semi-definite constraints in a Cotton-Schwab type of iteration 
are also presented. It should also be noted that there is no 
need to compute the complete dirty image in order to find 
the maximum and optimization techniques can do this much 
faster, especially if the user can provide windows similar to 
CLEAN windows currently used by radio astronomers. Like 
CLEAN, the LS-MVI should be implemented in the visibility 
domain to eliminate gridding effects. 

V. Global optimization based techniques 

We now turn to a second family of solutions to the image 
formation problem. These solutions are based on optimizing a 
global property of the image subject to goodness of fit to the 
data. They vary from least squares (LS) based techniques to 
maximum entropy and i\ based reconstruction. 

A. Linear deconvolution 

Computationally the simplest way to solve the image forma- 
tion problem is through linear inversion. There are two main 
approaches in this area: The well known Least Squares (LS) 
technique and Linear Minimum Mean Square Error (LMMSE). 
Such techniques can work well when the (u,v) coverage 
is good and the inversion is well conditioned. Furthermore 
linear inversion can work independently of the complexity 
of the source structure. However, linear techniques can result 
in significant noise enhancement in ill-posed problems. For a 
fully sampled visibility domain, these techniques can provide 
a first approximation to the image. To overcome this problem 
one can use a constrained LS, also known as non-negative LS 
(NNLS), first proposed for radio synthesis imaging by Briggs 
|[T8l . The idea is that the image is positive. Putting these 
constraints into the deconvolution, yields a computationally 
expensive, though feasible algorithm. An excellent overview 
of the implementation of the NNLS can be found in [18]. 

B. Maximum Entropy image reconstruction 

The maximum entropy image formation technique is one 
of the two most popular deconvolution techniques in radio 
astronomy (together with CLEAN). The maximum entropy 
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principle was first proposed by Jaynes (37l . A good overview 
of the philosophy behind the idea can be found in [38]. Since 
then it has been used in a wide spectrum of imaging problems. 
The basic idea behind MEM is the following: Out of all the 
images which are consistent with the measured data where the 
noise distribution does not satisfy the positivity demand,i.e., 
the sky brightness is a positive function, consider only those 
that satisfy the positivity demand. From these select the one 
that is most likely to have been created randomly. This idea 
was also proposed by ifTTTl for optical images and applied to 
radio astronomical imaging in (T2l . Other approaches based 
on differential entropy have also been suggested lfT3l and lfT4l . 
An extensive collection of papers discussing these different 
methods and aspects of maximum entropy can be found in a 
number of papers in (39). (40l provides an overview of various 
maximum entropy techniques and the use of the various 
options for choosing the entropy measure. Interestingly, in that 
paper, a closed form solution is given for the noiseless case, 
but not for the general case. 

The approach in fi~2ll begins with a prior image and iterates 
between maximizing the entropy function and updating the 
X 2 fit to the data. The computation of the image based on a 
prior image is done analytically, but at each step the model 
visibilities are updated, through a two-dimensional Fourier 
transform. This type of algorithm is known as a fixed point 
algorithm, since it is based on iterating a function until it 
converges to a fixed point. 

The maximum entropy solution is given by solving the 
following Lagrangian optimization problem [12]: 



T MEM 



arg max - 



]l(l,m) lo£ 



1(1, m) A 
F(l,m) 2 



x 2 (n 



where 



V(u,v) 



X 



W)= E ; 

(u,v)eA 



V(u, v) — V(u, v) 



(32) 



(33) 



are the model based visibilities, A is a Lagrange 
multiplier for the constraint that V(u,v) should match the 
measured visibilities V(u,v), A is the (u,v) coverage of the 
radio telescope and F(l,m) is a reference image. Taking the 
derivative with respect to 1(1, m) we obtain that the solution 
is given by: 



1(1, m) = exp (—1 + logF(Z, m) + AA(Z, m)) 



(34) 



where 

A((,m) 



(u,v)eA 



((V(u,v)-V(u,vj) 



3 27r(ul + vr n ) 



The basic maximum entropy algorithm now proceeds by 
choosing an initial image model (typically a flat image or a 
low resolution image) computing the model based visibilities 
V(u,v) on a grid A. Using these visibilities a new model 
image is computed by equation ( [34] ). New visibilities are 
computed from the new model and the process is iterated until 
convergence. 

While it is known that for the maximum entropy, this 
approach usually converges, the convergence can be slow 



(40l . Improved methods based on the Newton method and 
the Conjugate Gradient technique were put forward by (T5l . 
BTlL (42). These methods perform direct optimization of the 
entropy function subject to the x 2 constraint. Generalization 
of the maximum entropy using wavelets and multi-resolution 
techniques have also been proposed (see e.g., ifTTl . (431). 

C. Compressed sensing and sparse reconstruction techniques 

Recently there has been growing interest in using t\ based 
cost functions for deconvolution (see (2T1 and (22l and un- 
published notes by Schwardt). This renewed interest in l\ 
comes from recent results related to compressed sampling 
using Fourier bases. It is worth noting that as early as 1987 
Marsh and Richardson (44l proved that the CLEAN algorithm 
can be regarded as an t\ minimization for a single point source 
image. i\ is not the only criterion. Recovery of noisy and 
blurred images using total variation (TV) optimization for 
smooth images was discussed by Dobson and Santosa (451 . 
Chen et al. [46] dealt with t\ minimization of an image basis 
to achieve image sparseness using linear programming. Feuer 
and Nemirovski (47l and Elad and Bruckstein [48 ] established 
sufficient and necessary conditions for replacing £ optimiza- 
tion (computing the sparsest solution with high computational 
complexity) by linear programming when searching for the 
unique sparse representation. Rudelson and Vershynim (49l 
proved the best known guarantees for exact reconstruction of 
a sparse signal from its Fourier measurements. 

Radio astronomical image reconstruction is done based on 
the visibility measurement in the (u,v) domain. Reconstruc- 
tion of the source image 1(1, m) is equivalent to estimating the 
missing visibility points. The missing V(u, v) measurements 
together with the image itself are estimated by minimizing 
a cost function ||/(Z, ra)!!^ in the (l,m) domain using the 
constraints of image positivity and the measured visibility data. 
Note that since 1(1, m) is a positive quantity we have: 



\\I(l,m) 



N N 

EE 

1=1 m=l 



1(1, m) 



(35) 



which allows us to use linear programming. To solve the re- 
construction problem fast, we represent the problem as a linear 
programming problem with real variables. To that end let (•, •) 
be a one-to-one pairing function mapping {0, N — 1} x 
{0, N - 1} onto {0, N 2 - 1}. Let F be an N 2 x N 2 
matrix whose elements satisfy 



(l,m) , (u,v) 



Let £ = vec(V) and let t = vec(I). We have 

£ = Ft. 



(36) 



(37) 



Note that t is a real vector since the visibility measurements 
satisfy V(u,v) = V(—u,—v). To make the problem real 
we define F# = Re(F),Fj = Im(F) and variables £ R = 
R e (0?£/ = I m (0- (37} now becomes 



Zr = 



F/t 



(38) 
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For the measured locations {ui,Vi) we have: 



i R {{ui, Vi)) = Re (v(u h Vj^J i = 1, ^ 
Vi)) = Im (v^u*, Vi)J z = 1, M 

where M is the number of given measurements in the (u,v) 
domain. The linear programming problem is described in Table 
llV] (for more details the reader is referred to [21]). In lf22l a 



i=l ti 

Subject to 

£ R ({ui,Vi)) = Re (v(ui,Vi)^ 

€i((ui,Vi)) = Im (y(ui,ViY) 
< t 



TABLE IV 

l\ OPTIMIZATION USING LINEAR PROGRAMMING 



joint i\ and £2 is also discussed. This makes it possible to 
include prior knowledge on the noise power. Using the total 
variation is also a possibility that leads to i\ optimization. Note 
that using total variation and maximum entropy are related 
since both functionals impose smoothness on the image. 



VI. Self calibration and robust MVDR for 

SYNTHETIC APERTURE ARRAYS 

We now turn to the case where the array response is not 
completely known, but we have some statistical knowledge of 
the error, e.g., we know the co variance matrix of the array 
response error at each epoch (measurement time). Typically 
this covariance will be time invariant or will have slow 
temporal variation. In this case we extend the robust dirty 
image as described in [50] to the synthetic aperture array 
case. This generalization follows the analysis in [20]. Since the 
positive definite constraint on the residual covariance matrices 
is important in our application we extended the robust Capon 
estimator of l5l1 . To that end assume that at each epoch we 
have an uncertainty ellipsoid describing the uncertainty of the 
array response (as well as unknown atmospheric attenuation). 
This is described by 



(a fc (s) - a fc (s)) H C k (a fe (s) - a fc (s)) < 1 



(40) 



where afc(s) is the nominal value of the array response towards 
the point s and C k are the covariance matrices of the uncer- 
tainty in the calibration parameters at time k. Generalizing the 
LS-MVI we would like to solve the following problem: 



[p,ai,...,a fc ] = argmax p?ai5 ... 5afe p 
subject to 

R fc - a 2 l - pa fe af h 
(a fc (s) - a fe (s)) H C k (a fe (s) - a fe (s)) < 1 k 



k 



(41) 



Let r = 1/p. The problem (41 ) is equivalent to the following 
problem 

[f,ai, ...,a fc ] = argmin T)ai) ... >afc r 
subject to 

R/e - a 2 l a k 



y 



(a fc (s) - a fc (s)) 



H 



(a fc (s) - a fc (s)) 
1 



>-0 k = l,...,K. 



(42) 

This problem is once again a semi-definite programming prob- 
lem that can be solved efficiently via interior point techniques 
l52l . We can now replace the MVDR estimator by this robust 
version. Interestingly, we obtain estimates of the corrected 
array response a(sfc). Using the model we obtain for each 
k 

a fc (s) = T k R k (s) (43) 

Hence, the self-calibration coefficients can be estimated using 
least squares fitting 



f k = arg min V ||a fe (s^) - T k a. k (s £ ) 

"Vi --v„ ' * 



71 v,7p 



(44) 



where T k — diag{7/c 5 i, j k ,p}- Of course, when the self- 
calibration parameters vary slowly we can combine the esti- 
mation over multiple epochs. This might prove instrumental 
in calibration of LOFAR type arrays, where the calibration 
coefficients vary across the sky. Since the computational 
complexity of the self-calibration semi-definite programming 
is higher than that of the MVDR dirty image, it is too 
complicated to solve this problem for each source in the image. 
Hence it should be used in a way similar to the external 
self-calibration cycle ([53]), where this problem is solved 
using a nominal source locations model. The advantage over 
ordinary self-calibration is that beyond the re-evaluation of 
the calibration parameters, we obtain better estimates of the 
source powers, without significant increase in the complexity. 
Another interesting alternative is to use the doubly constrained 
robust Capon beamformer which combines a norm constraint 
as in the AAR dirty image with robust Capon beamforming 

El- 

VII. Examples and Comparisons 

In this section we describe three examples of the various 
algorithms, including a simulated example of an extended 
source, an example from the LOFAR test station and an 
example of Abell 2256 observed by the VL^] 

A. Simulated Extended Source 

An extended source (Figure fTO} ) was simulated using an 
East- West array containing 10 antennas logarithmically spaced 
up to 1000 A. CLEAN deconvloution results are depicted in 
Figure ([TT] a) and ([TT] b). After 100 CLEAN iterations, the 



center of the source is partially reconstructed with distortion. 
After additional 20 iterations an artifact is generated (below 
the strong point on the right). This divergence can often occur 

9 Initial calibration was done by Tracy Clarke 
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Fig. 10. Original extended source image 




Fig. 11. Reconstructed images of the CLEAN and LS-MVI algorithms, (a) 
CLEAN reconstructed image after 100 iterations, (b) CLEAN reconstructed 
image after 120 iterations, (c) LS-MVI reconstructed image after 100 itera- 
tions, (d) LS-MVI reconstructed image afer 300 iterations. 



in CLEAN when applying it to extended sources. The LS-MVI 



results are presented in Figure (11 c) and (11 d). After 100 
iterations the center of the source is reconstructed and after 200 
additional iterations and reconstruction is stable. The reason 
for this is the fact that CLEAN overestimate the power due 
to the high sidelobes level. Further analysis of this example is 
given in [20]. 



B. LOFAR Test Station Data 

The LOFAR test station data were recorded using 25 
frequency bands of 156kHz using 45 antennas (array geometry 
is given in Figure ([12] d)). The data were calibrated by S. 
Wijnholds. The AAR dirty image and the classic dirty image 
are given in Figure fl2fc) and (fT2]3), respectively. Since the 
LOFAR station benefits from a good (u,v) domain coverage, 
the two dirty images are similar. The reconstructed image 



using the LS-MVI algorithm is displayed in Figure (12 c); the 
spurious emission on the right side of the image was removed. 



Initial classic dirty image LOFAR 




lal AAR dirty image LOFAR 



o Cas A 



Reconstructed image with residual LOFAR 
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Fig. 12. LOFAR station images, (a) classic dirty image, (b) AAR dirty image, 
(c) Reconstructed image using AAR based cleaning, (d) Array geometry 



C. Abell 2256 

The last example used VLA data of Abell 2256 The 
data measured by Clarke and Ensslin l56l contain a single 
frequency band around 1369 MHz. The data were processed 
using both CLEAN and LS-MVI algorithms for 30 iterations 
P| We used a visibility domain CLEAN (updates were per- 
formed on the ungridded visibility). The initial data (dirty 



image) of the CLEAN are shown in Figure ([13] a), The strong 
sidelobes structure is clearly visible as large circles in the 
dirty image. In contrast the initial AAR dirty image is shown 



in Figure (13 b). The sidelobes level is much lower and 
several point sources that are invisible in the classical dirty 
image are now visible. The reconstruction using the visibility 
domain CLEAN is shown in Figure ([13] c). The sidelobes 
level is reduced and the source structure is clearly seen. The 
reconstruction using the LS-MVI algorithm is shown in Figure 
fl3]d). Similar to the CLEAN, the sources structure is visible 
and the sidelobes level is significantly reduced. It should be 
emphasized that even though we have used only 30 iterations, 
the strong structure is consistent with that of (56l and (57l . 
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Fig. 13. Abell 2256 images, (a) Initial classic dirty image, (b) Initial 
AAR dirty image, (c) classic CLEAN reconstructed image, (d) LS-MVI 
reconstructed image. 
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